Entity coreference resolution
TODO: comment on datasets Klappholz and Lockman (1977; History From Hirst (1981)Hirst, G. (1981). Anaphora in Natural Language Understanding: A Survey. Brown University.: "The high-school algebra problem answer ing sys t em STUDENT (Bobrow 1964), an early sys t em with natural language input , has only a few limited heuristics for resolving anaphor s and, more particularly, anaphor - like paraphrases and incomplete repetitions. ... Winograd's (1971, 1972) celebrated SHRDLU system ... providing impressive and, for the most part , sophisticated handling of anaphors , including references to earlier parts of the conversation between the program and its user ." Terminology: Winograd (1972Winograd, T. (1972). Understanding natural language. Cognitive Psychology, 3''(1), 1–191. http://doi.org/10.1016/0010-0285(72)90002-3, pp 30) use the term "back-reference" and "pronoun reference". Applications TODO: Recasens and Hovy (2009)Recasens, M., & Hovy, E. (2009). A deeper look into features for coreference resolution. ''Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5847 LNAI(i), 29–42. http://doi.org/10.1007/978-3-642-04975-0_3: "Coreference resolution ... has been shown to be beneficial in many NLP applications such as Information Extraction 1McCarthy, J.F., Lehnert, W.G.: Using decision trees for coreference resolution. In: Proceedings of IJCAI. (1995) 1050–1055, Text Summarization 2Steinberger, J., Poesio, M., Kabadjov, M.A., Jeek, K.: Two uses of anaphora resolu- tion in summarization. Information Processing and Management: an International Journal 43(6) (2007) 1663–1680, Question Answering 3Morton, T.S.: Using coreference in question answering. In: Proceedings of the Text REtrieval Conference 8. (1999) 85–89, and Machine Translation." Information extraction From Soon et al. (2001)Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases." Computational linguistics 27, no. 4 (2001): 521-544.: "information extraction (IE) systems like those built in the DARPA Message Understanding Conferences (Chinchor 1998; Sundheim 1995) have revealed that coreference resolution is such a critical component of IE systems" Analysis TODO: Hajishirzi et al. (2013)Hajishirzi, H., Zilles, L., Weld, D. S., & Zettlemoyer, L. (2013). Joint Coreference Resolution and Named-Entity Linking with Multi-pass Sieves. In EMNLP ’13 (pp. 289–299).: "The biggest challenge in coreference resolution — accounting for 42% of errors in the state-of-the-art Stanford system—is the inability to reason effectively about background semantic knowledge (Lee et al., 2013)." Error analysis Recall analysis: Martschat & Strube (2014)Martschat, S., & Strube, M. (2014). Recall Error Analysis for Coreference Resolution. Emnlp, 2070–2081. link-based error analysis (Uryupina, 2008Olga Uryupina. 2008. Error analysis for learning- based coreference resolution. In Proceedings of the 6th International Conference on Language Re- sources and Evaluation, Marrakech, Morocco, 26 May – 1 June 2008, pages 1914–1919.; Martschat, 2013Sebastian Martschat. 2013. Multigraph clustering for unsupervised coreference resolution. In 51st Annual Meeting of the Association for Computational Lin- guistics: Proceedings of the Student ResearchWork- shop, Sofia, Bulgaria, 5–7 August 2013, pages 81– 88.) transformation-based error analysis (Kummerfeld and Klein, 2013Jonathan K. Kummerfeld and Dan Klein. 2013. Error- driven analysis of challenges in coreference reso- lution. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Process- ing, Seattle,Wash., 18–21 October 2013, pages 265– 277.) Subproblems calculate scores for nominals, pronouns and proper names separately: (Ng and Cardie, 2002Vincent Ng and Claire Cardie. 2002. Improving machine learning approaches to coreference resolution. In Pro-ceedings of the 40th Annual Meeting on Association for Computational Linguistics , pages 104–111.; Haghighi and Klein, 2009Aria Haghighi and Dan Klein. 2009. Simple coreference resolution with rich syntactic and semantic features. In Proceedings of the 2009 Conference on Empiri-cal Methods in Natural Language Processing , pages 1152–1161.) Broad-referring expressions "It" is a particularly difficult case for coreference resolution. It might refer to singular inanimate objects, some animals, abstractions/events, or non-specific things (pleonastic uses). However, see also Lee et al. (2009)Li, Yifan, Petr Musilek, Marek Reformat, and Loren Wyard-Scott. "Identification of pleonastic it using the web." Journal of Artificial Intelligence Research 34 (2009): 339-389. in which the authors claim to identify pleonastic "it" with accuracy "comparable to those obtained by human efforts". "They" is slightly easier but more difficult than other pronouns. "This" and "that" can also refer to abstraction which is rather broad (McShane and Babkin, 2015)Mcshane, M., & Babkin, P. (2015). Resolving Difficult Referring Expressions, 1–21.. However, in most cases I found in OntoNotes, they are followed by a noun such as "this area", "this facility", etc. TODO: some special treatments: Kolhatkar and Hirst (2011)Kolhatkar, V., & Hirst, G. (2011). Resolving “ This-issue ” Anaphora., Müller (2008)Müller, M.-C. (2008). Fully Automatic Resolution of “it”, “this”, and “that” in Unrestricted Multi-Party Dialog.. Kolhatkar and Zinsmeister (2013)Kolhatkar, Varada, Heike Zinsmeister, and Graeme Hirst. "Annotating Anaphoric Shell Nouns with their Antecedents." LAW@ ACL. 2013.: "Anaphoric shell nouns (ASNs) such as this fact, this possibility, and this issue are common in all kinds of text. They are called shell nouns because they provide nominal conceptual shells for complex chunks of information representing abstract concepts such as fact, proposition, and event (Schmid, 2000)." Pronouns From Wiseman et al. (2016)Wiseman, Sam, Alexander M. Rush, and Stuart M. Shieber. "Learning Global Features for Coreference Resolution." arXiv preprint arXiv:1604.03035(2016).: "Wiseman et al. (2015) show that on the CoNLL 2012 English development set, almost 59% of mention-ranking precision errors and almost 24% of recall errors involve pronominal mentions. Martschat and Strube (2015) found a similar pattern in their comparison of mention-ranking, mention-pair, and latent-tree models." Opaque mentions Recasens et al. (2013)Recasens, Marta, Matthew Can, and Daniel Jurafsky. "Same Referent, Different Words: Unsupervised Mining of Opaque Coreferent Mentions." HLT-NAACL. 2013.: "Coreference resolution systems rely heavily on string overlap (e.g., Google Inc. and Google), performing badly on mentions with very different words (opaque mentions) like Google and the search giant." Types of coreferencing expressions TODO: referential hierarchies of Ariel (1988)M. Ariel. 1988. Referring and accessibility. Journal of Linguistics, pages 65–87. or Gundel et al. (1993)J. K. Gundel, N. Hedberg, and R. Zacharski. 1993. Cog- nitive status and the form of referring expressions in discourse. Language, 69:274–307. A. Open problems From McShane and Babkin (2016)Marjorie McShane, Petr Babkin. 2016. Resolving Difficult Referring Expressions PDF: "Among the more difficult referring expressions are so-called broad referring expressions, such as pronominal this and that ... In addition to untreated referring expressions, there are referring expressions that have been widely treated but have resisted high-precision results. One example is third person personal pronouns. The reason for the low precision is that resolution often requires specific world knowledge and reasoning, as illustrated by Winograd Schema examples like The man''i'' could not lift his son''k'' because was so weak / he''k'' was so heavy (Levesque et al., 2012)." Linguistic theories Centering Focusing Discourse representation theory TODO: (Cormack, 1993Cormack, S. 1993. Anaphora Resolution in Discourse Representation Theory. Ph.D. thesis, PhD thesis, University of Edinburgh.; Abraços and Lopes, 1994Abraços, J. and J.G. Lopes. 1994. Extending DRT with a focusing mechanism for pronominal anaphora and ellipsis resolution. Proceedings of the 15th conference on Computational linguistics-Volume 2, pages 1128–1132.) I can't get a PDF file of Cormack (1993). Reading Abraços and Lopes (1994), they seem to take a very different approach to coreference resolution. They propose and evaluate rules such as "recency rule" (look at the last constituent in the previous sentence), focus movement (what changes the focus), relative clause (the following sentence tends to bind to the main clause instead of relative clause), etc. To me these rules look so brittle and ad-hoc. Besides, there's no obvious mechanisms to resolve rule conflicts. Approaches Classified based on inferencing method * Rule-based * Inference-based: Inoue et al. (2012)Inoue, N., Ovchinnikova, E., Inui, K., & Hobbs, J. (2012). Coreference Resolution with ILP-based Weighted Abduction. In COLING (pp. 1291-1308). * Machine learning: see Pradheep (2005)Elango, Pradheep. "Coreference resolution: A survey." University of Wisconsin, Madison, WI (2005). PDF ** Naïve Bayes ** Decision tree ** Conditional random fields (McCallum and Wellner 2005McCallum, A., & Wellner, B. (2005). Conditional Models of Identity Uncertainty with Application to Noun Coreference. Advances in Neural Information Processing Systems 17, 905–912., etc.) ** Integer Linear Programming (a review: Rizzolo and Roth (2016)Rizzolo, N., & Roth, D. (2016). Integer Linear Programming for Coreference Resolution. In Anaphora resolution, Theory and Applications of Natural Language Processing (Vol. 11, pp. 315–343). http://doi.org/10.1017/S1351324905214006) ** Markov logic: *** Poon & Domingos (2008)Poon, H. & Domingos, P. (2008). Joint unsupervised coreference resolution with Markov Logic. In Proceedings of the 2008 Conference on Empirical Methods in Natural Lan- guage Processing, Waikiki, Honolulu, Hawaii, 25–27 October 2008, pages 650–659. : use MLN to encode (soft) rules such as: type/number/gender matching, apposition (e.g. Bill Gates, the chairman of Microsoft) and predicate nominals (e.g. he is Bill Gates). *** Bögel & Frank (2013)Bögel, T. & Frank, A. (2013). A joint inference architecture for global coreference clustering with anaphoricity. In Gurevych, I., Biemann, C., & Zesch, T. (Eds.), Language Pro- cessing and Knowledge in the Web, pages 35–46. Berlin, Heidelberg: Springer (Lecture Notes in Computer Science, 8105). : TODO ** Neural networks: Clark (2015)Clark, K. (2015). Neural Coreference Resolution., Clark and Manning (2016a, 2016bClark, K., & Manning, C. D. (2016b). Deep Reinforcement Learning for Mention-Ranking Coreference Models. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 2256–2262. Retrieved from http://arxiv.org/abs/1609.08667), Wiseman et al. (2016) Classified based on source of information Semantic knowledge Lee et al. (2013): "Haghighi and Klein found that this transductive learning was essential for semantic knowledge to be useful (Aria Haghighi, personal communication); other researchers have found that semantic knowledge derived from Web resources can be quite noisy (Uryupina et al. 2011a)." Discourse-based Discourse-based method takes into account aspects of discourse such as coherence and centering. From Laplinn and Leass (1994)Lappin, S., & Leass, H. J. (1994). An Algorithm for Pronominal Anaphora Resolution. Computational Linguistics, 20(4), 535–561. Retrieved from http://dl.acm.org/citation.cfm?id=203989: "Discourse Based Methods Most of the work in this area seeks to formulate general principles of discourse structure and interpretation and to integrate methods of anaphora resolution into a computational model of discourse interpretation (and sometimes of generation as well). Sidner (1981, 1983), Grosz, Joshi, and Weinstein (1983, 1986), Grosz and Sidner (1986), Brennan, Friedman, and Pollard (1987), and Webber (1988) present different versions of this approach. Dynamic properties of discourse, especially coherence and focusing, are invoked as the primary basis for identifying antecedence candidates; selecting a candidate as the antecedent of a pronoun in discourse involves additional constraints of a syntactic, semantic, and pragmatic nature." Potential problems: * From Laplinn and Leass (1994): "... assign too dominant a role to coherence and focus in antecedent selection. As a result, they establish a strong preference for intersentential over intrasentential anaphora resolution. This is the case with the anaphora resolution algorithm described by Brennan, Friedman, and Pollard (1987)." * Alshawi (1987, p. 62; as cited in Laplinn and Leass, 1994)) : an algorithm/model relying on the relative salience of all entities evoked by a text, with a mechanism for removing or filtering entities whose salience falls below a threshold, is preferable to models that "make assumptions about a single (if shifting) focus of attention." Mixed models Combining syntactic, semantic, and discourse factors, etc. Examples: Laplinn and Leass (1994), Asher and Wada (1988), Carbonell and Brown (1988), and Rich and LuperFoy (1988) Classified based on the construction of coreference chain : See also: Ng (2010)''Ng, V. (2010). Supervised Noun Phrase Coreference Research: The First Fifteen Years. ''ACL ’10, (July), 1396–1411. http://doi.org/10.1109/TVCG.2007.24'', Heng Ji's slide, Marschat and Strube (2015)'' TODO: latent-tree models? To construct a coreference chain, one can consider each elements separately or matching one element candidate to a partial chain. There are 04 major approaches to this problem: * Mention-pair model: whether two mentions are coreferential or not ** (Soon et al. 2001; Ng and Cardie 2002; Ji et al., 2005; McCallum & Wellner, 2004; Nicolae & Nicolae, 2006) ** Chang et al. (2013)Chang, K.-W., Samdani, R., & Roth, D. (2013). A Constrained Latent Variable Model for Coreference Resolution. In EMNLP.: "We model the task of coreference resolution using a pairwise scorer which indicates the compatibility of a pair of mentions. The inference routine then predicts the final clustering — a structured prediction problem—using these pairwise scores." * Entity-mention model: whether a mention and a preceding (partial) cluster are coreferential or not ** Ref: *** Pasula et al. 2003Pasula, H., Marthi, B., Milch, B., … S. R.-A. in neural, & 2003, undefined. (n.d.). Identity uncertainty and citation matching. Papers.nips.cc. Retrieved from http://papers.nips.cc/paper/2149-identity-uncertainty-and-citation-matching.pdf <-- citation matching; *** Luo et al. 2004Luo, X., Ittycheriah, A., Jing, H., … N. K.-P. of the 42nd, & 2004, U. (2004). A mention-synchronous coreference resolution algorithm based on the bell tree. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics.; Yang et al. 2004, 2008; Daume & Marcu, 2005; Culotta et al., 2007; Lee et al., 2013Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., & Jurafsky, D. (2013). Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules. Computational Linguistics, 39(4), 885–916. doi:10.1162/COLI) ** antecedent trees (Yu and Joachims, 2009Yu, C.-N. J., & Joachims, T. (2009). Learning structural SVMs with latent variables. Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, 1–8. http://doi.org/10.1145/1553374.1553523; Fernandes et al., 2014Fernandes, E. R., dos Santos, C. N., & Milidiú, R. L. (2014). Latent Trees for Coreference Resolution. Computational Linguistics, 40(4). http://doi.org/10.1162/COLI_a_00200; Björkelund and Kuhn, 2014) *** From Fernandes et al.: "We introduce coreference trees to represent mention clusters. A coreference tree isa directed tree whose nodes are the coreferring mentions in a cluster and whose arcs" * Mention-ranking model (also called mention-synchronousFor example, in Durrett and Klein (2013).): which of the preceding mentions is coreferential to a given mention ** Ref: Denis & Baldridge 2007, 2008 ** Special case: rank two candidate NPs, called tournament model by Iida et al. (2003)Ryu Iida, Kentaro Inui, Hiroya Takamura, and Yuji Matsumoto. 2003. Incorporating contextual cues in trainable models for coreference resolution. In Proceedings of the EACLWorkshop on The Compu- tational Treatment of Anaphora. and the twin-candidate model by Yang et al. (2003Xiaofeng Yang, Guodong Zhou, Jian Su, and Chew Lim Tan. 2003. Coreference resolution us- ing competitive learning approach. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 176–183.; 2008bXiaofengYang, Jian Su, and Chew Lim Tan. 2008b. A twin-candidate model for learning-based anaphora resolution. Computational Linguistics, 34(3):327– 356.) * Cluster-ranking model: which of the preceding clusters is coreferential to a given mention ** Ref: Rahman and Ng (2009)AltafRahman andVincentNg. 2009. Supervisedmod- els for coreference resolution. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 968–977. * Merging clusters (sometimes called entity-centric): ** From Clark and Manning (2015)Clark, Kevin, and Christopher D. Manning. "Entity-centric coreference resolution with model stacking." Association of Computational Linguistics (ACL). 2015.: "Our entity-centric “agent” builds up coreference chains with agglomerative clustering. It begins in a start state where each mention is in a separate single-element cluster. At each step, it observes the current state s, which consists of all partially formed coreference clusters produced so far, and selects some action a which merges two existing clusters. The action will result in a new state with new candidate actions and the process is repeated. The model is entity-centric in that it builds" * TODO: transition-based approach (somewhat between mention ranking and cluster merging?): Webster and Curran (2014)Webster, K., & Curran, J. R. (2014). Limited memory incremental coreference resolution. In COLING (pp. 2129–2139). From Ng (2010): "An important issue with ranking models that we have eluded so far concerns the identification of non-anaphoric NPs. As a ranker simply imposes a ranking on candidate antecedents or pre- ceding clusters, it cannot determine whether an NP is anaphoric (and hence should be resolved). To address this problem, Denis and Baldridge (2008) apply an independently trained anaphoricity classifier to identify non-anaphoric NPs prior to ranking, and Rahman and Ng (2009) propose a model that jointly learns coreference and anaphoricity" Entity-level models are expected to perform better than mention-level models because the former have access to more information. An example from Lee et al. (2013): As an illustration, the following text shows an example where the incorrect decision is taken if feature sharing is disabled: This was the best result of a Chinese gymnast in 4 days of competition... It was the best result for Greek gymnasts since they began taking part in gymnastic internationals. In the example text, the mention-pair model incorrectly links This and It, because all the features that can be extracted locally are compatible (e.g., number is singular for both pronouns). On the other hand, the entity-centric model avoids this decision because, in a previous sieve driven by predicate nominative relations, these pronouns are each linked to incompatible noun phrases, i.e., the best result of a Chinese gymnast and the best result for Greek gymnasts. Features See Features for entity coreference resolution Designing cluster(entity)-level features From Clark and Manning (2016)Clark, K., & Manning, C. D. (2016a). Improving Coreference Resolution by Learning Entity-Level Distributed Representations. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 643–653. http://doi.org/10.18653/v1/P16-1061: "A long-standing challenge in coreference resolution has been the incorporation of entity-level information -- features defined over clusters of mentions instead of mention pairs." From Wiseman et al. (2016): "We believe a major reason for the relative ineffectiveness of global features in coreference problems is that, as noted by Clark and Manning (2015), cluster-level features can be hard to define" Approaches: * Individual: scorecluster(c''1, ''c''2) = pooling over all scoremention(''m''1 in ''c''1, ''m''2 in ''c''2) -- This approach represents the relationship of individual mentions between two clusters. This approach is used in Clark and Manning (2016). * Summary: scorecluster(''c''1, ''c''2) = score(summarize(''c''1), summarize(''c''2)) where summarize function iterates through mentions of a cluster and returns a shared representation. -- This approach stress cluster representation while sacrificing the relationship of between-cluster mentions. Though in theory it can still capture such relationship through the summary representation, a big part of information is likely lost. The crudest of summarization is rules like "some mentions are names", "all mentions are singular", etc. "Early attempts at defining cluster-level features simply applied the coarse quantifier predicates all, none, most to the mention-level features defined on the mentions (or pairs of mentions) in a cluster (Culotta et al., 2007; Rahman and Ng, 2011)." (Wiseman et al. 2016) On the other extreme, Björkelund and Kuhn (2014)Anders Björkelund and Jonas Kuhn. 2014. Learning structured perceptrons for coreference Resolution with Latent Antecedents and Non-local Features. ACL, Baltimore, MD, USA, June. attempts to preserve information by ''concatenating ''information found in mentions (e.g. a feature of C-P-P for a cluster that has common noun followed by two pronouns.) TODO: Wiseman et al. (2016): "Bjorkelund and Kuhn (2014), Martschat and Strube (2015)Sebastian Martschat and Michael Strube. 2015. Latent structures for coreference resolution. TACL, 3:405– 418., Clark and Manning (2015)" Local vs. global features From Wiseman et al. (2016): "we might expect non-local models with access to global features to perform significantly better. However, models incorporating nonlocal features have a rather mixed track record. For instance, Bjorkelund and Kuhn (2014) found that ¨cluster-level features improved their results, whereas Martschat and Strube (2015) found that they did not. Clark and Manning (2015) found that incorporating cluster-level features beyond those involving the precomputed mention-pair and mention-ranking probabilities that form the basis of their agglomerative clustering coreference system did not improve performance. Furthermore, among recent, state-of-the-art systems, mention-ranking systems (which are completely local) perform at least as well as their more structured counterparts (Durrett and Klein, 2014; Clark and Manning, 2015; Wiseman et al., 2015; Peng et al., 2015)." Pipeline Preceding tasks Syntax, NER, etc. Mention detection Mention classification? Anaphoric identification Ng and Cardie (2002)Ng, V., & Cardie, C. (2002). Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. ''Proceedings of the 19th International Conference on Computational Linguistics -'', ''1(1987), 1–7. http://doi.org/10.3115/1072228.1072367 Coreference resolution Creating training examples: problem of class imbalance. Recasens and Hovy (2009) find balancing training instances to be ineffective for TiMBL. Open-source software and experiments See also Survey of open-source systems. * Martschat and Strube (2015): http://smartschat.de/software * Wiseman et al. (2016)Wiseman, Sam, Alexander M. Rush, and Stuart M. Shieber. "Learning Global Features for Coreference Resolution." arXiv preprint arXiv:1604.03035(2016).: https://github.com/swiseman/nn_coref * Clark and Manning (2016): https://github.com/clarkkev/deep-coref * Hybrid Coref (Lee et al. 2017)LEE, HEEYOUNG, MIHAI SURDEANU, and DAN JURAFSKY. "A scaffolding approach to coreference resolution integrating statistical and rule-based models." Natural Language Engineering (2017): 1-30.: https://github.com/heeyounglee/hcoref * cort - coreference resolution toolkit * dcoref - part of Stanford CoreNLP * Berkeley Coreference Resolution System * Illinois Coreference Package * xrenner - eXternally configurable REference and Non Named Entity Recognizer * e2e-coref - end-to-end coreference resolution system from AllenAI TODO http://aclweb.org/anthology/N/N15/N15-1082.pdf, Winograd schema. "advanced model for CR" See also * State-of-the-art * Coreference (psycholinguistics) * Entity coreference datasets * Survey by Sapena et al. (2008) * Examples of entity coreference resolution References Category:Entity coreference resolution